36 research outputs found
High-Speed Function Approximation using a Minimax Quadratic Interpolator
A table-based method for high-speed function approximation in single-precision floating-point format is presented in this paper. Our focus is the approximation of reciprocal, square root, square root reciprocal, exponentials, logarithms, trigonometric functions, powering (with a fixed exponent p), or special functions. The algorithm presented here combines table look-up, an enhanced minimax quadratic approximation, and an efficient evaluation of the second-degree polynomial (using a specialized squaring unit, redundant arithmetic, and multioperand addition). The execution times and area costs of an architecture implementing our method are estimated, showing the achievement of the fast execution times of linear approximation methods and the reduced area requirements of other second-degree interpolation algorithms. Moreover, the use of an enhanced minimax approximation which, through an iterative process, takes into account the effect of rounding the polynomial coefficients to a finite size allows for a further reduction in the size of the look-up tables to be used, making our method very suitable for the implementation of an elementary function generator in state-of-the-art DSPs or graphics processing units (GPUs)
FP8 Formats for Deep Learning
FP8 is a natural progression for accelerating deep learning training
inference beyond the 16-bit formats common in modern processors. In this paper
we propose an 8-bit floating point (FP8) binary interchange format consisting
of two encodings - E4M3 (4-bit exponent and 3-bit mantissa) and E5M2 (5-bit
exponent and 2-bit mantissa). While E5M2 follows IEEE 754 conventions for
representatio of special values, E4M3's dynamic range is extended by not
representing infinities and having only one mantissa bit-pattern for NaNs. We
demonstrate the efficacy of the FP8 format on a variety of image and language
tasks, effectively matching the result quality achieved by 16-bit training
sessions. Our study covers the main modern neural network architectures - CNNs,
RNNs, and Transformer-based models, leaving all the hyperparameters unchanged
from the 16-bit baseline training sessions. Our training experiments include
large, up to 175B parameter, language models. We also examine FP8
post-training-quantization of language models trained using 16-bit formats that
resisted fixed point int8 quantization
Recommended from our members
"Older Adults with ASD: The Consequences of Aging." Insights from a series of special interest group meetings held at the International Society for Autism Research 2016-2017
A special interest group (SIG) entitled "Older Adults with ASD: The Consequences of Aging" was held at the International Society for Autism Research (INSAR) annual meetings in 2016 and 2017. The SIG and subsequent meetings brought together, for the first time, international delegates who were members of the autistic community, researchers, practitioners and service providers. Based on aging autism research that is already underway in UK, Europe, Australia and North America, discussions focussed on conceptualising the parameters of aging when referring to autism, and the measures that are appropriate to use with older adults when considering diagnostic assessment, cognitive factors and quality of life in older age. Thus, the aim of this SIG was to progress the research agenda on current and future directions for autism research in the context of aging. A global issue on how to define 'aging' when referring to ASD was at the forefront of discussions. The âagingâ concept can in principle refer to all developmental transitions. However, in this paper we focus on the cognitive and physical changes that take place from mid-life onwards. Accordingly, it was agreed that aging and ASD research should focus on adults over the age of 50 years, given the high rates of co-occurring physical and mental health concerns and increased risk of premature death in some individuals. Moreover, very little is known about the cognitive change, care needs and outcomes of autistic adults beyond this age. Discussions on the topics of diagnostic and cognitive assessments, and of quality of life and well-being were explored through shared knowledge about which measures are currently being used and which background questions should be asked to obtain comprehensive and informative developmental and medical histories. Accordingly, a survey was completed by SIG delegates who were representatives of international research groups across four continents, and who are currently conducting studies with older autistic adults. Considerable overlap was identified across different research groups in measures of both autism and quality of life, which pointed to combining data and shared learnings as the logical next step. Regarding the background questions that were asked, the different research groups covered similar topics but the groups differed in the way these questions were formulated when working with autistic adults across a range of cognitive abilities. It became clear that continued input from individuals on the autism spectrum is important to ensure that questionnaires used in ongoing and future are accessible and understandable for people across the whole autistic spectrum, including those with limited verbal abilities
Design Issues In High Performance Floating Point Arithmetic Units
In recent years computer applications have increased in their computational complexity. The industry-wide usage of performance benchmarks, such as SPECmarks, forces processor designers to pay particular attention to implementation of the floating point unit, or FPU. Special purpose applications, such as high performance graphics rendering systems, have placed further demands on processors. High speed floating point hardware is a requirement to meet these increasing demands. This work examines the state-of-the-art in FPU design and proposes techniques for improving the performance and the performance/area ratio of future FPUs. In recent FPUs, emphasis has been placed on designing ever-faster adders and multipliers, with division receiving less attention. The design space of FP dividers is large, comprising five different classes of division algorithms: digit recurrence, functional iteration, very high radix, table look-up, and variable latency. While division is an infrequent operation..